Multiple sclerosis (MS) is a chronic neuroinflammatory autoimmune disease. MS is characterized by demyelination and subsequent formation of lesions through out the central nervous system (CNS). Grey matter (GM) and white matter (WM) lesions in the brain show pathological differences. Microglia are CNS-resident and can function as antigen-presenting cells and phagocytes. Their role in MS is complex and controversial (Luo 2017). In this paper by van der Poel et. al, titled “Transcriptional profiling of human microglia reveals grey-white matter heterogeneity and multiple sclerosis-associated changes”, a total of 31 human microglia samples were analyzed from 10 MS donors (MS), including 5 grey matter (GM) and 10 white matter (WM) samples, and from 11 non-neurological control donors (CON), including 5 grey matter and 11 white matter samples. From each sample, the total RNA was extracted (Poel 2019).
The authors isolated human microglia according to the previously described general experimental design. They suggest that pathological changes in MS tissue may be associated with changes in microglia, leading to lesion initiation. The authors wanted to analyze the transcriptional profile of microglia in the collected samples by RNA-sequencing (Poel 2019). Ultimately, they wanted to discover MS-related changes in microglia between the GM and WM brain regions.
In A1, we did data cleaning, normalization, and mapped HUGO symbols.
We initially had 21283 genes and after filtering out those
with low counts (< 1 count/million), we were left with
15076 genes. TMM was used to normalize the counts. Our
genes were already mapped to HUGO symbols. With an MDS plot, we saw that
our samples clustered together by tissue, WM or GM, regardless of
whether the samples were from MS or CON patients.
knitr::include_graphics("~/projects/A3/figures/a1_mdsplot.png")
Figure 1. MDS plot of all MS and CON patient samples. Samples clustered together by tissue regardless of patient group.
In A2, we did differential expression analysis and preliminary
over-representation analysis (ORA). In the ORA, with comparison of the
two tissues, GM and WM - we found 3042 genes that were
differentially expressed in MS patients, and
2901 genes that were differentially expressed in
CON patients using the conventional p-value of 0.05. This
was a high number, so we reduced the p-value to 0.01.There
1621 genes for the MS patients and 1498 genes
for the CON patients.
knitr::include_graphics("~/projects/A3/figures/a2_heatmap.png")
Figure 2. Heat map of global gene expression across all samples.Samples were annotated based on their patient group and tissue on top of each column, its legend is found on the right. The heat map was coloured based on expression of each gene, where red is up-regulated and blue is down-regulated. Values were converted from counts to CPM and scaled. The samples were also hierarchically clustered by their expression (dendrograms for both rows and columns were excluded to keep the figure neat).
knitr::include_graphics("~/projects/A3/figures/a2_volcanoms.png")
knitr::include_graphics("~/projects/A3/figures/a2_volcanocon.png")
Figure 3. Volcano plot of differentially expressed genes comparing (A) grey matter and white matter tissues in MS patients, and (B) grey matter and white matter tissues in CON patients. Differentially expressed genes are coloured based on whether they are up- or down-regulated. Genes were selected when FDR < 0.05 AND logFC > 0 for up-regulated OR logFC < 0 for down-regulated. If they do not fulfill these requirements, they are labelled as not differentially expressed (DE).
if (! requireNamespace("BiocManager", quietly = TRUE)) {
install.packages("BiocManager")
}
if (! requireNamespace("limma", quietly = TRUE)) {
BiocManager::install("limma")
}
if (! requireNamespace("knitr", quietly = TRUE)) {
install.packages("knitr")
}
if (! requireNamespace("dplyr", quietly = TRUE)) {
install.packages("dplyr")
}
if (! requireNamespace("edgeR", quietly = TRUE)) {
BiocManager::install("edgeR")
}
if (! requireNamespace("ComplexHeatmap", quietly = TRUE)) {
BiocManager::install("ComplexHeatmap")
}
if (! requireNamespace("circlize", quietly = TRUE)) {
BiocManager::install("circlize")
}
if (!requireNamespace("gprofiler2", quietly = TRUE)) {
install.packages("gprofiler2")
}
if (!requireNamespace("tidyverse", quietly = TRUE)) {
install.packages("tidyverse")
}
if (! requireNamespace("fgsea", quietly = TRUE)) {
BiocManager::install("fgsea")
}
library("Biobase")
library("knitr")
library("edgeR")
library("limma")
library("dplyr")
library("ggplot2")
library("ComplexHeatmap")
library("circlize")
library("gprofiler2")
library("RColorBrewer")
library("kableExtra")
library("stringr")
We will be loading the results of our DE analysis from A2 as our input for this assignment.
# Comparison between GM and WM from MS
deg_con <- read.csv("~/projects/A3/data/results_con.csv", sep = ";", row.names = 1,
header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
library(tidyverse)
deg_con <- deg_con %>%
mutate_all(funs(parse_number(str_replace(., ",", "."))))
# Comparison between GM and WM from CON
deg_ms <- read.csv("~/projects/A3/data/results_ms.csv", sep = ";", row.names = 1,
header = TRUE, stringsAsFactors = FALSE, check.names = FALSE)
deg_ms <- deg_ms %>%
mutate_all(funs(parse_number(str_replace(., ",", "."))))
The loaded data was produced with the topTags function
from the edgeR package - consisting of the log FC, log CPM,
F-value, p-value, and adjusted p-value (FDR) for each DE gene.
We will perform GSEA on the the DE genes between GM and WM for each patient group separately. We will then compare our GSEA results with the ORA results we previously got from A2.
GSEA is a non-threshold method - since we use a ranked list of genes without an arbitrary threshold, we can perform an assessment of all genes in the list. GSEA is more consistent in identifying enriched gene sets and pathways, making it a more preferable method over ORA for analysis of two sets of DE genes under two different conditions (Abatangelo and Ancona 2009). In this report, the two different conditions will be the patient groups (MS/CON).
We will make a ranked list of genes to provide as an input into GSEA. A gene’s rank is calculated by the negative log10 of its p-value by the sign of the logFC.
GSEA is performed using the GSEA software Version 4.3.2,
using the predefined GSEAPreranked mode. We performed GSEA
analysis with the following parameters:
Max size and Min size represent the upper
and lower bounds of the gene set size that is included in the analysis.
This allows us to identify pathways that are specific enough for later
interpretation. Due to the nature of our chosen data set, we are
interested in understanding the biological processes that may be
involved in this process. We used the most recent version (March 2023)
of the
Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt
gene set from the Bader Lab at the University of Toronto. This can be
found here.
We specifically chose the gene set that has no annotations from
electronic annotations (IEA). This annotation file curates all GO:
Biological Process terms as well as pathways from multiple pathway
databases like Reactome (Vastrik et al.
2007) and MSigDB (Liberzon et al.
2011).
# Just going to name the gene column
deg_ms$Gene <- rownames(deg_ms)
# Constructing ranked gene list
deg_ms$rank <- (-log(deg_ms$PValue, base = 10)) * sign(deg_ms$logFC)
deg_ms_ranked <- dplyr::select(deg_ms, Gene, rank) %>%
arrange(desc(rank)) %>%
dplyr::select(Gene, rank)
# Save the ranked gene list
write.table(deg_ms_ranked, file = "~/projects/A3/data/deg_ms.rnk", sep = "\t", col.name = TRUE,
row.names = FALSE, quote = FALSE)
Configuration
# Defining paths
gsea_exe <- "~/projects/GSEA_4.3.2/gsea-cli.sh"
gsea_dir <- "~/projects/A3/data/MS"
gsea_gmt <- "~/projects/A3/data/Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt"
gsea_rnk <- "~/projects/A3/data/deg_ms.rnk"
# Permutations
perm <- 1000
# Max size
max_size <- 200
# Min size
min_size <- 15
# Analysis name
analysis_name <- "GMvsWM_MS"
Download the GMT file if not already downloaded.
if (!file.exists(gsea_gmt)) {
gmt_url <- "http://download.baderlab.org/EM_Genesets/March_02_2023/Human/symbol/Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt"
download.file(gmt_url, destfile = gsea_gmt)
}
Run GSEA.
cmd <- paste(gsea_exe, "GSEAPreranked",
"-gmx", gsea_gmt,
"-collapse false",
"-nperm", perm,
"-rnk", gsea_rnk,
"-scoring_scheme weighted",
"-rpt_label", analysis_name,
"-plot_top_x 20 -rnd_seed 12345",
"-set_max", max_size,
"-set_min", min_size,
"-zip_report false",
"-out", gsea_dir)
path1 <- "~/projects/A3/data/MS/GMvsWM_MS.GseaPreranked.1680973644388"
if (!file.exists(path1)) {
system(cmd)
}
Results
ms_up_reg <- read.table(file = "~/projects/A3/data/MS/GMvsWM_MS.GseaPreranked.1680973644388/gsea_report_for_na_pos_1680973644388.tsv",
sep = "\t", header = TRUE, fill = TRUE)
ms_neg_reg <- read.table(file = "~/projects/A3/data/MS/GMvsWM_MS.GseaPreranked.1680973644388/gsea_report_for_na_neg_1680973644388.tsv",
sep = "\t", header = TRUE, fill = TRUE)
Upregulated
# Prepare for display
MSupreg_table <- ms_up_reg %>%
dplyr::select(NAME, SIZE, ES, FDR.q.val) %>%
dplyr::rename(`Gene Set` = NAME, Size = SIZE, `Enrichment Score` = ES, `FDR q-value` = FDR.q.val) %>%
# Format NAME
dplyr::mutate(`Gene Set` = stringr::str_remove(`Gene Set`, "%.*$")) %>%
dplyr::mutate(`Gene Set` = stringr::str_to_sentence(`Gene Set`)) %>%
# Sort the table by FDR q-value
dplyr::arrange(`FDR q-value`)
# Display the table
DT::datatable(MSupreg_table, rownames = FALSE, filter = "top", options = list(ppageLength = 10,
scrollX = TRUE)) %>%
# Changing the font size of content
DT::formatStyle(names(MSupreg_table), fontSize = "12px")
Table 1. Enriched gene sets of the upregulated
genes in GM vs WM in the MS patient group.
Downregulated
# Prepare for display
MSnegreg_table <- ms_neg_reg %>%
dplyr::select(NAME, SIZE, ES, FDR.q.val) %>%
dplyr::rename(`Gene Set` = NAME, Size = SIZE, `Enrichment Score` = ES, `FDR q-value` = FDR.q.val) %>%
# Format NAME
dplyr::mutate(`Gene Set` = stringr::str_remove(`Gene Set`, "%.*$")) %>%
dplyr::mutate(`Gene Set` = stringr::str_to_sentence(`Gene Set`)) %>%
# Sort the table by FDR q-value
dplyr::arrange(`FDR q-value`)
# Display the table
DT::datatable(MSnegreg_table, rownames = FALSE, filter = "top", options = list(ppageLength = 10,
scrollX = TRUE)) %>%
# Changing the font size of content
DT::formatStyle(names(MSnegreg_table), fontSize = "12px")
Table 2. Enriched gene sets of the
downregulated genes in GM vs WM in the MS patient group.
Notably, our GSEA analysis identified 904 gene sets that were
associated with the upregulated genes in GM vs WM in MS patients but
r sum(MSupreg_table$`FDR q-value` < 0.25) were
significantly enriched (FDR < 0.25). While there were 4089 gene sets
enriched, and r sum(MSnegreg_table$`FDR q-value` < 0.25)
of these gene sets were significant (FDR < 0.25) in the downregulated
genes in GM vs WM in MS patients.
We can look at the top 5 gene sets over-represented at the top of the ranked gene list.
MSup_res <- data.frame(pathway = MSupreg_table$`Gene Set`, size = MSupreg_table$Size,
FDR = MSupreg_table$`FDR q-value`, es = MSupreg_table$`Enrichment Score`)
ggplot(MSup_res[1:5, ]) + geom_point(aes(x = es, color = FDR, y = pathway, size = size)) +
theme(axis.title.x = element_text(), axis.title.y = element_text()) + scale_color_gradient(low = "red",
high = "blue") + labs(x = "Enrichment Score", color = "FDR", size = "Genesize set",
y = NULL, title = "Top enriched gene sets in upregulated
genes in GM vs WM of MS patients")
Figure 1. Top 5 enriched gene sets identified by GSEA for the up-regulated genes. Dots are are coloured by FDR and their size is determined by the size of the gene set.
MSneg_res <- data.frame(pathway = MSnegreg_table$`Gene Set`, size = MSnegreg_table$Size,
FDR = MSnegreg_table$`FDR q-value`, es = MSnegreg_table$`Enrichment Score`)
ggplot(MSneg_res[1:5, ]) + geom_point(aes(x = es, color = FDR, y = pathway, size = size)) +
theme(axis.title.x = element_text(), axis.title.y = element_text()) + scale_color_gradient(low = "red",
high = "blue") + labs(x = "Enrichment Score", color = "FDR", size = "Genesize set",
y = NULL, title = "Top enriched gene sets in downregulated
genes in GM vs WM of MS patients")
Figure 2. Top 5 enriched gene sets identified by GSEA for the up-regulated genes. Dots are are coloured by FDR and their size is determined by the size of the gene set.
# Just going to name the gene column
deg_con$Gene <- rownames(deg_con)
# Constructing ranked gene list
deg_con$rank <- (-log(deg_con$PValue, base = 10)) * sign(deg_con$logFC)
deg_con_ranked <- dplyr::select(deg_con, Gene, rank) %>%
arrange(desc(rank)) %>%
dplyr::select(Gene, rank)
# Save the ranked gene list
write.table(deg_con_ranked, file = "~/projects/A3/data/deg_con.rnk", sep = "\t",
col.name = TRUE, row.names = FALSE, quote = FALSE)
Configuration
# Defining paths
gsea_exe <- "~/projects/GSEA_4.3.2/gsea-cli.sh"
gsea_dir <- "~/projects/A3/data/CON"
gsea_gmt <- "~/projects/A3/data/Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt"
gsea_rnk <- "~/projects/A3/data/deg_con.rnk"
# Permutations
perm <- 1000
# Max size
max_size <- 200
# Min size
min_size <- 15
# Analysis name
analysis_name <- "GMvsWM_CON"
Download the GMT file if not already downloaded.
if (!file.exists(gsea_gmt)) {
gmt_url <- "http://download.baderlab.org/EM_Genesets/March_02_2023/Human/symbol/Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt"
download.file(gmt_url, destfile = gsea_gmt)
}
Run GSEA.
cmd <- paste(gsea_exe, "GSEAPreranked",
"-gmx", gsea_gmt,
"-collapse false",
"-nperm", perm,
"-rnk", gsea_rnk,
"-scoring_scheme weighted",
"-rpt_label", analysis_name,
"-plot_top_x 20 -rnd_seed 12345",
"-set_max", max_size,
"-set_min", min_size,
"-zip_report false",
"-out", gsea_dir)
path2 <- "~/projects/A3/data/CON/GMvsWM_CON.GseaPreranked.1680999322997"
if (!file.exists(path2)) {
system(cmd)
}
Results
con_up_reg <- read.table(file = "~/projects/A3/data/CON/GMvsWM_CON.GseaPreranked.1680999322997/gsea_report_for_na_neg_1680999322997.tsv",
sep = "\t", header = TRUE, fill = TRUE)
con_neg_reg <- read.table(file = "~/projects/A3/data/CON/GMvsWM_CON.GseaPreranked.1680999322997/gsea_report_for_na_pos_1680999322997.tsv",
sep = "\t", header = TRUE, fill = TRUE)
Upregulated
# Prepare for display
CONupreg_table <- con_up_reg %>%
dplyr::select(NAME, SIZE, ES, FDR.q.val) %>%
dplyr::rename(`Gene Set` = NAME, Size = SIZE, `Enrichment Score` = ES, `FDR q-value` = FDR.q.val) %>%
# Format NAME
dplyr::mutate(`Gene Set` = stringr::str_remove(`Gene Set`, "%.*$")) %>%
dplyr::mutate(`Gene Set` = stringr::str_to_sentence(`Gene Set`)) %>%
# Sort the table by FDR q-value
dplyr::arrange(`FDR q-value`)
# Display the table
DT::datatable(CONupreg_table, rownames = FALSE, filter = "top", options = list(ppageLength = 10,
scrollX = TRUE)) %>%
# Changing the font size of content
DT::formatStyle(names(CONupreg_table), fontSize = "12px")
Table 3. Enriched gene sets of the upregulated
genes in GM vs WM in the CON patient group.
Downregulated
# Prepare for display
CONnegreg_table <- con_neg_reg %>%
dplyr::select(NAME, SIZE, ES, FDR.q.val) %>%
dplyr::rename(`Gene Set` = NAME, Size = SIZE, `Enrichment Score` = ES, `FDR q-value` = FDR.q.val) %>%
# Format NAME
dplyr::mutate(`Gene Set` = stringr::str_remove(`Gene Set`, "%.*$")) %>%
dplyr::mutate(`Gene Set` = stringr::str_to_sentence(`Gene Set`)) %>%
# Sort the table by FDR q-value
dplyr::arrange(`FDR q-value`)
# Display the table
DT::datatable(CONnegreg_table, rownames = FALSE, filter = "top", options = list(ppageLength = 10,
scrollX = TRUE)) %>%
# Changing the font size of content
DT::formatStyle(names(CONnegreg_table), fontSize = "12px")
Table 4. Enriched gene sets of the
downregulated genes in GM vs WM in the MS patient group.
Our GSEA analysis identified 3997 gene sets that were associated with
the upregulated genes in GM vs WM in MS patients but
r sum(CONupreg_table$`FDR q-value` < 0.25) were
significantly enriched (FDR < 0.25). While there were 996 gene sets
enriched, and
r sum(CONnegreg_table$`FDR q-value` < 0.25) of these
gene sets were significant (FDR < 0.25) in the downregulated genes in
GM vs WM in MS patients.
We can look at the top 5 gene sets over-represented at the top of the ranked gene list.
CONup_res <- data.frame(pathway = CONupreg_table$`Gene Set`, size = CONupreg_table$Size,
FDR = CONupreg_table$`FDR q-value`, es = CONupreg_table$`Enrichment Score`)
ggplot(CONup_res[1:5, ]) + geom_point(aes(x = es, color = FDR, y = pathway, size = size)) +
theme(axis.title.x = element_text(), axis.title.y = element_text()) + scale_color_gradient(low = "red",
high = "blue") + labs(x = "Enrichment Score", color = "FDR", size = "Genesize set",
y = NULL, title = "Top enriched gene sets in upregulated
genes in GM vs WM of CON patients")
Figure 6. Top 5 enriched gene sets identified by GSEA for the up-regulated genes. Dots are are coloured by FDR and their size is determined by the size of the gene set.
CONneg_res <- data.frame(pathway = CONnegreg_table$`Gene Set`, size = CONnegreg_table$Size,
FDR = CONnegreg_table$`FDR q-value`, es = CONnegreg_table$`Enrichment Score`)
ggplot(CONneg_res[1:5, ]) + geom_point(aes(x = es, color = FDR, y = pathway, size = size)) +
theme(axis.title.x = element_text(), axis.title.y = element_text()) + scale_color_gradient(low = "red",
high = "blue") + labs(x = "Enrichment Score", color = "FDR", size = "Genesize set",
y = NULL, title = "Top enriched gene sets in downregulated
genes in GM vs WM of CON patients")
Figure 7. Top 5 enriched gene sets identified by GSEA for the up-regulated genes. Dots are are coloured by FDR and their size is determined by the size of the gene set.
As mentioned earlier in this assignment, GSEA was performed using the
GSEA software Version 4.3.2 for
command line. We used the most recent version (March 2023) of the
Human_GOBP_AllPathways_no_GO_iea_March_02_2023_symbol.gmt
gene set from the Bader Lab at the University of Toronto. This can be
found here.
We identified 904 gene sets that were associated with the upregulated
genes in GM vs WM in MS patients but
r sum(MSupreg_table$`FDR q-value` < 0.25) were
significantly enriched (FDR < 0.25). For the downregulated genes,
there were 4089 gene sets enriched, and
r sum(MSnegreg_table$`FDR q-value` < 0.25) of these gene
sets were significant (FDR < 0.25). In Fig. 4, we
can see various gene sets that are involved in immunity such as
Regulation of type I Interferon-mediated signaling pathway.
In Fig. 5, a gene set that is particularly relevant to
this data set is Hallmark tnfa signaling via nfkb. In A2,
we found similar results with ORA and found that the NF-kB pathway is
essential in the pathology of MS (Mc Guire and
Loo 2013). It is interesting how this is downregulated in GM vs
WM.
We identified 3997 gene sets that were associated with the
upregulated genes in GM vs WM in MS patients but
r sum(CONupreg_table$`FDR q-value` < 0.25) were
significantly enriched (FDR < 0.25). For the downregulated genes,
there were 996 gene sets enriched, and
r sum(CONnegreg_table$`FDR q-value` < 0.25) of these
gene sets were significant (FDR < 0.25). Interestingly, our results
for this analysis (Fig. 6) is similar to what is seen
in Fig. 5, with
Hallmark tnfa signaling via nfkb in particular. For the MS
group, the ES is -0.7880487 while in the CON group, the ES
is -0.80684453. In Fig. 7, a gene set that
is particularly relevant to this data set is
Particular importance of Trna metabolic process..
In A2, we found:
MSupFilePath <- "~/projects/A3/figures/a2_msUpTopTerms.RDS"
a2_MSupResults <- readRDS(MSupFilePath)
head(a2_MSupResults)
Table 3. Analysis from A2: Enriched gene sets from thresholded GSEA (p-value < 0.01) using upregulated genes only.
When comparing this to what we found in our non-thresholded GSEA in Fig.4, we can see that A2’s results outlined in Table 3 has more immune mechanism-related terms concerning Toll-like receptors (TLRs). Fig.4 seems to have more vague and generic terms.
MSdownFilePath <- "~/projects/A3/figures/a2_msDownTopTerms.rds"
a2_MSdownResults <- readRDS(MSdownFilePath)
head(a2_MSdownResults)
Table 4. Analysis from A2: Enriched gene sets from thresholded GSEA (p-value < 0.01) using downregulated genes only.
When comparing this to what we found in our non-thresholded GSEA in
Fig.5, we can see that A2’s results outlined in
Table 4 has terms related to translation and protein
processes. There are a few translation-related terms in Fig.
5 but there is a particular gene set that stands out with the
largest size out of the top 5: the previously mentioned
Hallmark tnfa signaling via nfkb.
CONupFilePath <- "~/projects/A3/figures/a2_conUpTopTerms.RDS"
a2_CONupResults <- readRDS(CONupFilePath)
head(a2_CONupResults)
Table 5. Analysis from A2: Enriched gene sets from thresholded GSEA (p-value < 0.01) using upregulated genes only.
When comparing this to what we found in our non-thresholded GSEA in
Fig. 6, we can see that A2’s results outlined in
Table 5 has more immune mechanism-related terms such as
regulation of response and myeloid leukocyte mediated immunity. Similar
to what we found in A2, Fig. 6 has the term
Hallmark tnfa signaling via nfkb - this is similar to A2’s
positive regulation of I-kappaB kinase/NF-kappaB signaling.
CONdownFilePath <- "~/projects/A3/figures/a2_conDownTopTerms.rds"
a2_CONdownResults <- readRDS(CONdownFilePath)
head(a2_CONdownResults)
Table 6. Analysis from A2: Enriched gene sets from thresholded GSEA (p-value < 0.01) using downregulated genes only.
When comparing this to what we found in our non-thresholded GSEA in
Fig. 7, we can see that A2’s results outlined in
Table 6 has more generic terms related to apoptosis.
Fig. 7 has more specific terms such as
Rho GTPases activate NADPH oxidases and
Trna metabolic process.
While the results from the two different analyses are not the exact same, I do not think they are in disagreement as there are still a few similarities. These results also still align with experimental conditions. It is, however, important to note that this is NOT a straight forward comparison. Throughout this part of the assignment, we have only quantitatively compared and contrasted the top 5 terms of enriched gene sets. There might be other information that can affect our interpretation. Additionally, for the A2 results, we reduced our threshold to p-value < 0.01 as there were many terms that passed the conventional p-value < 0.05. This is unlike our non-thresholded GSEA in this assignment where we used the threshold FDR < 0.05. We also only used the annotation data GO:BP (releases/2022-12-04), Reactome (releases/2022-12-28), and WikiPathways (releases/2022-12-10) for A2. These collections are only a few of what is used in the gene set from the Bader Lab.
We have attempted to run EnrichmentMap from within R but ran into issues. Instead, we used the EnrichmentMap (3.3.5) app (Merico et al. 2010) in the Cytoscape 3.9.1 software (Shannon et al. 2003). The EnrichmentMap app needs to be installed separately. The default parameters were used and edge cutoff was set to 0.375 with the Jaccard + Overlap combined metric.
knitr::include_graphics("~/projects/A3/figures/MS_network.png")
knitr::include_graphics("~/projects/A3/figures/CON_network.png")
Figure 8. (A) Enrichment map of the dysregulated gene sets and pathways in MS patients, (B) Enrichment map of the dysregulated gene sets and pathways in CON patients. Both are prior to manual layout. Each node is a gene set/pathway and is coloured either red or blue to indicated whether it is up- or down-regulated respectively. Node sizes correspond to gene set size. Edges represent overlap between nodes and the edge width represents the number of genes that overlap.
The networks made in Fig. 8 alone are not enough to interpret and infer the biological themes of these gene sets. We then applied the AutoAnnotate app (1.4.0) (BaderLab 2016) in Cytoscape (Shannon et al. 2003) to annotate the Enrichment Map. We manually eliminated clusters and nodes to make it more readable. The MCL (Markov Cluster) clustering algorithm was used to annotate the common labels of nodes. It is efficient and effective at detecting clusters in large networks. We used the default parameters: max word per label leaving = 3, minimum word occurrence = 1.
knitr::include_graphics("~/projects/A3/figures/annoMS_network.png")
knitr::include_graphics("~/projects/A3/figures/annoCON_network.png")
Figure 9. (A) Annotated enrichment map of the dysregulated gene sets and pathways in MS patients, (B) Annotated enrichment map of the dysregulated gene sets and pathways in CON patients. Each node is a gene set/pathway and is coloured either red or blue to indicated whether it is up- or down-regulated respectively. Node sizes correspond to gene set size. Edges represent overlap between nodes and the edge width represents the number of genes that overlap. The size of annotation text corresponds to the number of gene sets in the annotation.
To produce a publication-ready figure, we excluded some themes that were irrelevant - too generic, little/no edges between nodes.
knitr::include_graphics("~/projects/A3/figures/pubReadyAnno_network.png")
Figure 10. Enrichment Map created with the EnrichmentMap (3.3.5) app in Cytoscape 3.9.1. Network was created with parameters FDR Q-value threshold < 0.01 and combined similarity cutoff > 0.375. Annotations were made with the AutoAnnotate app (1.4.0). Themes with little/no edges and generic terms were removed. (A) Annotated enrichment map of the dysregulated gene sets and pathways in MS patients, (B) Annotated enrichment map of the dysregulated gene sets and pathways in CON patients. Each node is a gene set/pathway and is coloured either red or blue to indicated whether it is up- or down-regulated respectively. Node sizes correspond to gene set size. Edges represent overlap between nodes and the edge width represents the number of genes that overlap. The size of annotation text corresponds to the number of gene sets in the annotation.
Fig. 9 gives us a better idea of the biological
themes of our gene sets. To better visualize these relationships, we
further collapsed the clusters into single nodes by selecting the
Collapse all option in the AutoAnnotate menu and used the
CoSE (Compound Spring Embedder) algorithm.
knitr::include_graphics("~/projects/A3/figures/collapseAnnoMS_network.png")
knitr::include_graphics("~/projects/A3/figures/collapseAnnoCON_network.png")
Figure 11. (A) Collapsed annotated enrichment map of the dysregulated gene sets and pathways in MS patients, (B) Collapsed annotated enrichment map of the dysregulated gene sets and pathways in CON patients. CoSE cluster layout was used. Each node is a gene set/pathway and is coloured either red or blue to indicated whether it is up- or down-regulated respectively. Node sizes correspond to gene set size. Edges represent overlap between nodes and the edge width represents the number of genes that overlap. The size of annotation text corresponds to the number of gene sets in the annotation.
Fig. 11 still seems to be a bit “busy”. We excluded some of the smaller individual nodes for both networks.
knitr::include_graphics("~/projects/A3/figures/legendExcludedNetworks.png")
Figure 12. (A) Collapsed annotated enrichment map of the dysregulated gene sets and pathways in MS patients, (B) Collapsed annotated enrichment map of the dysregulated gene sets and pathways in CON patients. CoSE cluster layout was used. Each node is a gene set/pathway and is coloured either red or blue to indicated whether it is up- or down-regulated respectively. Node sizes correspond to gene set size. Edges represent overlap between nodes and the edge width represents the number of genes that overlap. The size of annotation text corresponds to the number of gene sets in the annotation. Smaller nodes from Fig. 11 were excluded to produce these figures.